Assessing Disclosure Protection for a Soi Public Use File

نویسندگان

  • Marianne Winglee
  • Richard Valliant
  • Jay Clark
  • Yunhee Lim
  • Michael Weber
چکیده

This paper describes an evaluation of the disclosure protection methods for the Individual Tax Model Public Use File (PUF) released by the Statistics of Income (SOI) Program of the Internal Revenue Service. The purpose of this evaluation is to explore options to strengthen disclosure protection while limiting information loss for tax returns with high incomes. We first present the introduction and motivation for this study. We then discuss the preparation of the PUF, options for subsampling high income returns (from samples in an internal nonPUF), and options for disclosure protection by microaggregation (grouping microdata in aggregates of three). We also discuss the method and data used to measure disclosure risk and information loss. We then discuss our results and recommendations for further research. Finally, we list references used in this paper. The first Individual Income Tax Return PUF was created in 1960. Needless to say, the issue of disclosure control was not the same hot topic then that it is today. Basic precautions were taken, like the removal of obvious identifiers such as name, address, and Social Security number, but little more than that. During the mid-1980’s, SOI undertook a reevaluation of its disclosure control procedures (Strudler, Oh, and Scheuren, 1986). Subsequently, no record was given a weight of less than three, all amount fields were rounded to four significant digits, top coding was applied for selected codes, and some fields were eliminated for high-income records. In addition, certain fields were blurred or microaggregated in groups of three. During the 1990’s, SOI, along with all of the other statistical agencies that release PUFs, reexamined its disclosure control procedures in light of technological changes (increased computer power, decreased storage costs, advances in record linkage techniques, and the proliferation of information networks such as the Internet). SOI’s current approach is to determine what items in the PUF can be obtained by an outside intruder. After the suspect fields have been identified, an extract from the IRS Individual Master File is made which contains these fields for all taxpayers. This extract and the as yet unreleased PUF are then matched, using record linkage software. If the results cause alarm, additional blurring or subsampling is performed. This process provides SOI with what SOI believes is a limited but objective measure of disclosure risk. An obvious question that arises is what is the relative impact of the various disclosure procedures on the risk of disclosure. For example, if the subsampling procedure limited records to a minimum weight of 5 instead of 3, how would the disclosure risk measurement change? If the records were microaggregated in larger groups and in a less rigid hierarchical order, how would the disclosure risk measurement change? Of course, the next obvious question that arises is what impact do disclosure control procedures have on data quality? In the end, the disclosure process is a constant effort to produce PUFs that retain as many qualities of the original data as possible while maintaining confidentiality. What follows are some of the results of our attempt to answer these questions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring Disclosure Risk and an Examination of the Possibilities of Using Synthetic Data in the Individual Income Tax Return Public Use File

The Statistics of Income Division (SOI) currently measures disclosure risk through a distance-based technique that compares the public use file (PUF) against the population of all tax returns and uses top-coding, subsampling and multivariate microaggregation as disclosure avoidance techniques. SOI is interested in exploring the use of other techniques that prevent disclosure while providing les...

متن کامل

A CRONYM : Data without Boundaries D

Disclosure limitation methods for protecting the confidentiality ofrespondents in survey microdata often use perturbative techniques whichintroduce measurement error into the categorical identifying variables. Inaddition, the data itself will often have measurement errors commonly arisingfrom survey processes. There is a need for valid and practical ways to assess theprotect...

متن کامل

Assessing the Statistical Disclosure Risk of a Demographic Microdata File

There are two recent developments related to survey data dissemination that may be increasing the risk of disclosure of respondent data. One is that statistical agencies are now releasing more microdata files than previously, partly in response to the urging of researchers needing the data for precise analytic work. For example, some data rich files with possibly high disclosure risk, that have...

متن کامل

Microdata Protection

Governmental, public, and private organizations are more and more frequently required to make data available for external release in a selective and secure fashion. Most data are today released in the form of microdata, reporting information on individual respondents. The protection of microdata against improper disclosure is therefore an issue that has become increasingly important and will co...

متن کامل

The Hippocratic File System: Protecting Privacy in Networked Storage

Privacy protection is increasingly difficult in today’s information society. In this paper, we look at an important link in the chain of information protection: the file system, and propose mechanisms to enhance the disclosure control of personal data. The scheme, called the Hippocratic File System, stores personal data’s purpose and use limitation as the data’s label, propagates the label as t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002